import vipy
assert vipy.version.is_at_least('1.9.5')
V = vipy.util.load('/Users/jba3139/Desktop/pip_175k/valset.pkl')
This dataset uses multi-label activities with dense bounding box annotations. Each object may be performing zero or more activities simultaneously, and the framewise labels capture when an object is performing an activity in a given frame. This means that a person can be simutaneously be performing two or more activities such as "person_talks_on_phone" and "person_opens_facility_door". This can also manifest due to the MEVA annotation definitions, which can introduce overlapping activities such as "vehicle_dropping_off" and "vehicle_stopping".
We recommend using the framewise labels to export tubelets of deforming bounding boxes over time for training. Examples of extracting labels and boxes from the toolchain are shown below.
v = V[1].mindim(512).load() # load a single video from the dataset
print(v) # Each video has useful information when printed
<vipy.video.scene: height=512, width=512, frames=223, color=rgb, filename="/Users/jba3139/Desktop/pip_175k/videos/car_drops_off_person/778838D5-92E0-49DB-B1C9-AEB05669D125-3542-00000294C2F31029_1.mp4", fps=30.0, category="car_drops_off_person", tracks=1, activities=3>
v[0].show() # show the first frame of video as an annotated image.
v[100].show() # 100th frame of video
v[200].show() # 200th frame of video
<vipy.image.scene: height=512, width=512, color=rgb, category="car_drops_off_person", objects=1>
im = v[100][0].show() # 100th frame, first object
im.crop().show() # crop this object using it's bounding box
<vipy.image.imagedetection: height=256, width=170, color=rgb, category="Vehicle", bbox=(xmin=0.0, ymin=0.0, width=170.0, height=256.0)>
im.boundingbox() # return the bounding box for this object
<vipy.object.detection: category="Vehicle", bbox=(xmin=0.0, ymin=0.0, width=170.0, height=256.0)>
im.boundingbox().json() # as JSON for portability
'{"_xmin":0,"_ymin":0,"_xmax":170,"_ymax":256,"_id":"2cb7abdc3f104dc5b660704c0b6c90fd","_label":"Vehicle","_shortlabel":"Vehicle","_confidence":null,"attributes":{"trackid":"f9991ee6ecac11ea82acac1f6b2c363c","activityid":["f9991bb2ecac11ea82acac1f6b2c363c"],"noun verb":[["Vehicle","dropping off"]]}}'
# make square, crop and resize to 224x224
im = v[100][0].boxmap(lambda bb: bb.dilate(1.2).maxsquare()).crop().mindim(224).show()
print(v.activitylabels(0)) # The labels in frame 0
print(v.activitylabels(100)) # in frame 100
print(v.activitylabels(200)) # in frame 200
lbls = [(k,lbl) for (k,lbl) in enumerate(v.label())] # (frame index, label set) tuples
{'car_drops_off_person', 'car_stops'}
{'car_drops_off_person'}
{'car_drops_off_person', 'car_starts'}
The use of joint activity labels means that activities can occur simulataneously. A single actor can be performing more than one activity at the same time, which means that a loss that assumes one-hot ground truth labels (e.g. categorical cross entropy) is an inappropriate choice for training. Instead, we recommend a framewise mutli-label loss that can be trained with multiple simulataneous labels per frame (e.g. binary cross-entropy).
We recommend running your proposal generation pipeline on these videos to output your own object tracks for encoding the clips for training. This will use the proper bounding box style for encoding tracks for representing activities.
For example, the following code will run an object detector on each frame of video, and compute the intersection of the returned object detections with the ground truth using a greedy bounding box assignment based on bounding box intersection over union. You can use the resulting annotated frame (e.g. imdet.objects()) as a replacement for the ground truth with your proposals.
from pycollector.detection import ObjectDetector
detect = ObjectDetector()
for im in V[3].mindim(512).stream():
im.show() # the original labeled image
imdet = detect(im).show() # the new detections
imdet.intersection(im, miniou=0.8, bycategory=False).show() # the best assignment of your detection to the truth (if any)
break
The MEVA annotation requirements includes class specific temporal padding which introduces up to five seconds of activity padding before and after activities occur. In order to be strictly consistent with the MEVA annotation definitions, we have introduced the MEVA padding as a post processing step. However, this padding can introduce label errors during training due to background frames purposely mislabeled as the target class. Furthermore, this padding may introduce overlapping disjoint activities (e.g. opening and closing simultaneously). Our videos were collected with tight and disjoint temporal boundaries as determined by the collectors when the videos were recorded. We recommend undoing the MEVA padding at training time, and labeling the padded frames as a framewise label to precisely localize the activities as determined by the annotator. Then, the temporal padding may be re-introduced at test time. Contact us at info@visym.com and we will provide you these precise framewise labels.
Our collection platform includes additional labels that can aid in your training. We subdivide broad classes into subclasses, which provide a more challenging task for training. For example, we break out the broad class "person_puts_down_object" into "person_puts_down_object_on_shelf", "person_puts_down_object_on_floor" and "person_puts_down_object_on_table". These are visually distinct activities, which can be rolled into a single class "person_puts_down_object", but we recommend using the sub-classes during training to reduce overfitting. Then, at test time, the original MEVA labels can be used.
Also, the collection platform includes additional weak labels that can aid in your training. These labels are stored as metadata for each video and include:
You can access this metadata as a dictionary or by casting a video to a Collector Video objects.
V[-1].metadata()
{'collection_id': 'b37f9ace-77ea-4659-8781-5e59176dcd25',
'video_id': 'E0E65439-B009-4215-9F95-BA19BC6A5CD8',
'ipAddress': '109.245.32.25',
'duration': 10,
'app_version': '1.0.22',
'os_version': '13.6.1',
'collection_name': 'Unload something from a rear door',
'program_name': 'MEVA',
'device_identifier': 'ios',
'subject_ids': ['10559b39-25a9-44bb-b9fa-44d6eb2e96ab'],
'device_type': 'iPhone6S',
'frame_rate': 29.974859795010634,
'frame_width': 1080,
'collected_date': '2020-08-23 06:36:46',
'collector_id': '2c4dd6fd-b71a-4850-a688-f3b7761835d4',
'blurred_faces': 0,
'project_name': 'MEVA Car',
'frame_height': 1920,
'project_id': '4c66a969-892c-4114-b711-45b2af02244a',
'orientation': 'portrait',
'rotate': None}
from pycollector.video import Video
v = Video.cast(V[-1])
print(v.geolocation())
print(v.uploaded())
{'ip': '109.245.32.25', 'host': '109.245.32.25', 'isp': 'Telenor d.o.o. Beograd', 'city': 'Belgrade', 'countrycode': 'RS', 'countryname': 'Serbia', 'latitude': '44.8166', 'longitude': '20.4721'}
2020-08-23 06:36:46-04:00
Our collections organize activities into groups to introduce diversity in the scene. For example, we specify to the collectors to load and unload both from a trunk and from a rear door of a vehicle to help introduce intra-class diversity. Also, we introduce joint activities such as "Leave this scene while talking on a phone". The full list of collection names are self explanatory and are available as follows. Classes may be filtered to remove variants that may not reflect the target domain bias (e.g. motorcycle are not present in MEVA), or which do not satisfy the assumptions of the loss function (e.g. joint activities).
print(set([v.metadata()['collection_name'] for v in V if 'collection_name' in v.metadata()]))
{'Greet a friend with a handshake while sitting', 'Hand something to your friend', 'Carry a heavy object while walking and put it down on the floor', 'Drive car turning around', 'Drive car turning right while starting', 'Come into a scene through an opening', 'Drive car turning right', 'Drive car turning left while stopping', 'Pick up, then walk and carry a heavy object', 'Talk while fidgeting', 'Walk and talk', 'Load something into a rear door', 'Walk while talking and texting on your phone', 'Steal something from your friend and walk away', 'Sit and read a document', 'Leave this scene through a closed door while talking on a phone', 'Pick up an object from a table and put it down on a high shelf', 'Greet a friend with a handshake then talk while standing', 'Pick up a document from a table and read', 'Come into this scene through a closed door', 'Put down a package on the table and walk away', 'Ride a Bicycle', 'Drive car backwards', 'Talk on phone and fidget', 'Read and fidget', 'Drive car turning right while stopping', 'Purchase something from a machine', 'Pick up an object from the floor and put it down on a nearby table', 'Greet a friend with a hug then chat while standing', 'Leave this scene through a closed door', 'Leave this scene through an opening while talking on a phone', 'Quickstart', 'Drop off passenger from car', 'Put down a package on the floor and walk away', 'Leave this scene through an opening', 'Pick up an object from the floor while seated and put on the other side of you', 'Drive car turning left ', 'Drive car backwards while turning left', 'Use a laptop and fidget', 'Come into a scene through an opening while talking on a phone', 'Sit and use laptop at table', 'Sit and use laptop on lap', 'Greet a friend with a handshake then talk while walking', 'Pick up an object from the floor and put it down on a neaby shelf', 'Drive car turning left while starting', 'Pick up an object from a shelf and put it down on a table', 'Get into a car using a front door', 'Pick up passenger in car', 'Drive car backwards while turning right', 'Come into this scene through a closed door while talking on a phone', 'Greet a sitting friend with a hug', 'Get out of a car using a front door', 'Unload something from a rear door', 'Greet a friend with a hug then chat while walking ', 'Pick up an object from a table and put it down on the floor', 'Get into a car using a rear door', 'Get out of a car using a rear door', 'Hold hands while walking and talking'}
Our pipeline supports optical flow based stabilization of video. This reduces the artfacts due to hand-held cameras to stabilize the background. Remaining artifacts are due to non-planar scenes, rolling shutter distortion and subpixel optical flow correspondence errors. The stabilization is only valid within the tracked actor bounding box for small camera motions. Large motions will introduce stabilization artifacts due to non-planar scene effects and should be filtered prior to usage. The stabilization artifacts will manifest as a slightly shifting background relative to the actor which may affect flow based methods.
The pip-175k-stabilized release was constructed by running this stabilization on all videos and updating the object boxes accordingly. You can run this yourself as shown below, or use the public release. You can use the attribute "stabilize" to filter on the stabilziation residual to filter out those videos with too large a distortion.
d = vipy.util.groupbyasdict(V, lambda v: v.category())
v = d['person_carries_heavy_object'][0].mindim(256).stabilize()
v.frame(0).show()
v.frame(150).show()
print(v.getattribute('stabilize')) # the stabilization residual for filtering poorly stabilized videos
[vipy.flow.stabilize]: Affine coarse to fine stabilization ...
{'mean residual': 1.3057592780679719, 'median residual': 0.9582987531289169}
You can export torch or numpy arrays, or just transcode your videos for native ingestion into your pipeline at the appropriate frame size.
d = vipy.util.groupbyasdict(V, lambda v: v.category())
v = d['person_carries_heavy_object'][0]
v = v.crop(v.trackbox().dilate(1.5).maxsquare()).mindim(224).saveas('/tmp/out.mp4')
v.thumbnail(frame=0).show()
v.show(notebook=True)
[vipy.video.annotate]: Annotating video ...
v.torch().shape # export the transcoded video as a torch tensor
torch.Size([282, 3, 224, 224])
v.json() # Export the metadata as a JSON encoded string
/Users/jba3139/dev/vipy/vipy/video.py:423: UserWarning: JSON serialization of video requires flushed buffers, will not include the loaded video. Try store()/restore()/unstore() instead to serialize videos as standalone objects efficiently.
warnings.warn("JSON serialization of video requires flushed buffers, will not include the loaded video. Try store()/restore()/unstore() instead to serialize videos as standalone objects efficiently.")
'{"_filename":"\\/tmp\\/out.mp4","_url":null,"_framerate":30,"_array":null,"_colorspace":"rgb","attributes":{"blurred_faces":0,"collected_date":"2020-05-06 15:27:25","collection_id":"P004C006","collector_id":"533e5fb295","device_identifier":"android","device_type":"CPH1969","duration":16,"frame_height":1920,"frame_rate":30.0,"frame_width":1080,"orientation":"portrait","os_version":"28","project_id":"P004","subject_ids":["20200506_1527244575402164829910826"],"video_id":"20200506_1527244575402164829910826","rotate":null},"_startframe":null,"_endframe":null,"_endsec":null,"_startsec":null,"_ffmpeg":"ffmpeg -i \\/tmp\\/out.mp4 dummyfile","_category":"person_carries_heavy_object","_tracks":{"c3005754eb9011ea9217ac1f6b2c363c":{"_id":"c3005754eb9011ea9217ac1f6b2c363c","_label":"person","_shortlabel":"person","_framerate":null,"_interpolation":"linear","_boundary":"strict","attributes":{},"_keyframes":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281],"_keyboxes":[{"_xmin":65.93,"_ymin":37.92,"_xmax":158.88,"_ymax":186.65},{"_xmin":65.6,"_ymin":38.52,"_xmax":158.6,"_ymax":186.87},{"_xmin":65.58,"_ymin":38.48,"_xmax":158.84,"_ymax":186.85},{"_xmin":65.55,"_ymin":38.5,"_xmax":159.0,"_ymax":186.84},{"_xmin":65.84,"_ymin":38.81,"_xmax":159.71,"_ymax":187.54},{"_xmin":65.81,"_ymin":37.99,"_xmax":159.76,"_ymax":186.6},{"_xmin":65.78,"_ymin":38.14,"_xmax":159.76,"_ymax":186.6},{"_xmin":66.08,"_ymin":38.55,"_xmax":160.34,"_ymax":187.31},{"_xmin":65.13,"_ymin":38.76,"_xmax":159.34,"_ymax":187.32},{"_xmin":65.15,"_ymin":38.03,"_xmax":159.27,"_ymax":186.4},{"_xmin":65.5,"_ymin":38.48,"_xmax":159.82,"_ymax":187.12},{"_xmin":65.56,"_ymin":38.7,"_xmax":159.75,"_ymax":187.15},{"_xmin":65.97,"_ymin":38.2,"_xmax":160.33,"_ymax":186.95},{"_xmin":65.14,"_ymin":38.39,"_xmax":159.37,"_ymax":186.99},{"_xmin":65.3,"_ymin":38.55,"_xmax":159.39,"_ymax":187.04},{"_xmin":65.49,"_ymin":38.68,"_xmax":159.44,"_ymax":187.09},{"_xmin":65.71,"_ymin":37.85,"_xmax":159.54,"_ymax":186.21},{"_xmin":65.96,"_ymin":37.93,"_xmax":159.66,"_ymax":186.27},{"_xmin":65.29,"_ymin":37.98,"_xmax":158.86,"_ymax":186.35},{"_xmin":65.59,"_ymin":38.0,"_xmax":159.04,"_ymax":186.43},{"_xmin":65.9,"_ymin":38.0,"_xmax":159.24,"_ymax":186.51},{"_xmin":66.24,"_ymin":37.96,"_xmax":159.45,"_ymax":186.6},{"_xmin":66.25,"_ymin":38.6,"_xmax":159.06,"_ymax":186.93},{"_xmin":66.6,"_ymin":38.5,"_xmax":159.3,"_ymax":187.02},{"_xmin":66.01,"_ymin":38.37,"_xmax":158.61,"_ymax":187.12},{"_xmin":66.05,"_ymin":37.97,"_xmax":158.25,"_ymax":186.52},{"_xmin":66.08,"_ymin":38.48,"_xmax":157.9,"_ymax":186.86},{"_xmin":66.42,"_ymin":38.25,"_xmax":158.16,"_ymax":186.97},{"_xmin":67.38,"_ymin":38.68,"_xmax":158.73,"_ymax":187.31},{"_xmin":67.38,"_ymin":38.15,"_xmax":158.37,"_ymax":186.73},{"_xmin":67.37,"_ymin":38.52,"_xmax":158.0,"_ymax":187.07},{"_xmin":67.34,"_ymin":37.92,"_xmax":157.62,"_ymax":186.49},{"_xmin":67.29,"_ymin":38.21,"_xmax":157.22,"_ymax":186.84},{"_xmin":68.14,"_ymin":38.47,"_xmax":157.73,"_ymax":187.18},{"_xmin":67.71,"_ymin":38.45,"_xmax":156.7,"_ymax":186.84},{"_xmin":68.49,"_ymin":38.63,"_xmax":157.16,"_ymax":187.18},{"_xmin":68.33,"_ymin":38.77,"_xmax":156.68,"_ymax":187.52},{"_xmin":68.72,"_ymin":38.64,"_xmax":156.49,"_ymax":187.18},{"_xmin":69.06,"_ymin":38.47,"_xmax":156.28,"_ymax":186.84},{"_xmin":69.67,"_ymin":38.49,"_xmax":156.6,"_ymax":187.17},{"_xmin":69.03,"_ymin":38.25,"_xmax":155.42,"_ymax":186.83},{"_xmin":69.24,"_ymin":37.97,"_xmax":155.1,"_ymax":186.49},{"_xmin":70.27,"_ymin":38.54,"_xmax":155.62,"_ymax":187.03},{"_xmin":70.36,"_ymin":38.18,"_xmax":155.2,"_ymax":186.69},{"_xmin":70.37,"_ymin":38.66,"_xmax":154.75,"_ymax":187.22},{"_xmin":70.22,"_ymin":38.23,"_xmax":154.28,"_ymax":186.88},{"_xmin":70.65,"_ymin":38.64,"_xmax":154.65,"_ymax":187.41},{"_xmin":70.37,"_ymin":37.97,"_xmax":154.48,"_ymax":186.44},{"_xmin":69.97,"_ymin":38.34,"_xmax":154.87,"_ymax":186.98},{"_xmin":69.9,"_ymin":38.5,"_xmax":155.6,"_ymax":186.89},{"_xmin":69.37,"_ymin":38.03,"_xmax":156.08,"_ymax":186.61},{"_xmin":68.91,"_ymin":38.2,"_xmax":156.09,"_ymax":186.57},{"_xmin":68.48,"_ymin":38.59,"_xmax":155.89,"_ymax":187.15},{"_xmin":69.11,"_ymin":37.97,"_xmax":156.05,"_ymax":186.33},{"_xmin":69.63,"_ymin":38.4,"_xmax":155.98,"_ymax":186.96},{"_xmin":69.64,"_ymin":38.06,"_xmax":155.14,"_ymax":186.81},{"_xmin":70.38,"_ymin":38.36,"_xmax":154.61,"_ymax":186.89},{"_xmin":71.52,"_ymin":38.11,"_xmax":154.61,"_ymax":186.81},{"_xmin":71.6,"_ymin":38.51,"_xmax":153.29,"_ymax":186.97},{"_xmin":72.7,"_ymin":38.38,"_xmax":153.24,"_ymax":186.96},{"_xmin":72.9,"_ymin":38.3,"_xmax":152.36,"_ymax":187.0},{"_xmin":73.49,"_ymin":38.08,"_xmax":151.73,"_ymax":186.46},{"_xmin":73.52,"_ymin":38.05,"_xmax":150.79,"_ymax":186.5},{"_xmin":74.28,"_ymin":38.01,"_xmax":150.58,"_ymax":186.48},{"_xmin":75.02,"_ymin":37.92,"_xmax":150.33,"_ymax":186.4},{"_xmin":75.03,"_ymin":38.52,"_xmax":149.3,"_ymax":186.97},{"_xmin":75.84,"_ymin":38.26,"_xmax":149.0,"_ymax":186.65},{"_xmin":76.26,"_ymin":38.06,"_xmax":148.39,"_ymax":186.74},{"_xmin":77.25,"_ymin":38.33,"_xmax":148.05,"_ymax":186.85},{"_xmin":77.84,"_ymin":38.19,"_xmax":147.41,"_ymax":186.68},{"_xmin":77.99,"_ymin":38.61,"_xmax":146.46,"_ymax":187.07},{"_xmin":79.15,"_ymin":38.69,"_xmax":146.7,"_ymax":187.36},{"_xmin":79.2,"_ymin":38.6,"_xmax":145.72,"_ymax":187.26},{"_xmin":79.89,"_ymin":38.57,"_xmax":145.48,"_ymax":187.25},{"_xmin":80.42,"_ymin":37.94,"_xmax":145.24,"_ymax":186.51},{"_xmin":80.29,"_ymin":38.45,"_xmax":144.69,"_ymax":187.03},{"_xmin":80.7,"_ymin":38.3,"_xmax":144.9,"_ymax":186.74},{"_xmin":80.45,"_ymin":38.29,"_xmax":144.83,"_ymax":186.99},{"_xmin":80.45,"_ymin":38.66,"_xmax":145.07,"_ymax":187.41},{"_xmin":79.96,"_ymin":38.53,"_xmax":144.87,"_ymax":187.26},{"_xmin":79.85,"_ymin":38.01,"_xmax":144.99,"_ymax":186.57},{"_xmin":79.77,"_ymin":38.15,"_xmax":145.16,"_ymax":186.67},{"_xmin":79.84,"_ymin":38.34,"_xmax":145.36,"_ymax":186.84},{"_xmin":79.41,"_ymin":37.91,"_xmax":144.86,"_ymax":186.32},{"_xmin":80.24,"_ymin":38.6,"_xmax":145.61,"_ymax":187.18},{"_xmin":79.76,"_ymin":37.97,"_xmax":144.86,"_ymax":186.59},{"_xmin":80.15,"_ymin":38.27,"_xmax":144.86,"_ymax":186.79},{"_xmin":80.18,"_ymin":38.18,"_xmax":144.54,"_ymax":186.83},{"_xmin":81.0,"_ymin":38.19,"_xmax":144.93,"_ymax":186.88},{"_xmin":81.05,"_ymin":38.22,"_xmax":144.5,"_ymax":186.94},{"_xmin":80.76,"_ymin":38.0,"_xmax":143.54,"_ymax":186.41},{"_xmin":81.44,"_ymin":38.65,"_xmax":143.72,"_ymax":187.2},{"_xmin":81.74,"_ymin":38.27,"_xmax":143.38,"_ymax":186.65},{"_xmin":82.22,"_ymin":38.0,"_xmax":143.46,"_ymax":186.65},{"_xmin":82.27,"_ymin":38.22,"_xmax":143.03,"_ymax":186.82},{"_xmin":82.17,"_ymin":38.35,"_xmax":142.58,"_ymax":186.97},{"_xmin":82.66,"_ymin":38.39,"_xmax":142.86,"_ymax":187.11},{"_xmin":82.67,"_ymin":38.15,"_xmax":142.69,"_ymax":186.65},{"_xmin":82.78,"_ymin":38.76,"_xmax":142.97,"_ymax":187.5},{"_xmin":82.45,"_ymin":38.33,"_xmax":142.8,"_ymax":186.99},{"_xmin":82.02,"_ymin":38.57,"_xmax":142.67,"_ymax":187.22},{"_xmin":82.31,"_ymin":38.7,"_xmax":143.31,"_ymax":187.43},{"_xmin":81.56,"_ymin":38.54,"_xmax":142.8,"_ymax":187.04},{"_xmin":81.93,"_ymin":38.47,"_xmax":143.54,"_ymax":187.18},{"_xmin":82.12,"_ymin":38.1,"_xmax":143.89,"_ymax":186.74},{"_xmin":81.74,"_ymin":38.39,"_xmax":143.56,"_ymax":187.01},{"_xmin":81.51,"_ymin":38.58,"_xmax":143.3,"_ymax":187.26},{"_xmin":81.87,"_ymin":38.54,"_xmax":143.39,"_ymax":186.94},{"_xmin":81.9,"_ymin":37.93,"_xmax":143.24,"_ymax":186.43},{"_xmin":82.04,"_ymin":38.05,"_xmax":143.13,"_ymax":186.65},{"_xmin":82.26,"_ymin":38.19,"_xmax":143.05,"_ymax":186.88},{"_xmin":82.25,"_ymin":38.2,"_xmax":142.56,"_ymax":186.58},{"_xmin":82.59,"_ymin":38.46,"_xmax":142.51,"_ymax":186.85},{"_xmin":82.52,"_ymin":38.29,"_xmax":142.19,"_ymax":186.96},{"_xmin":82.9,"_ymin":38.06,"_xmax":142.15,"_ymax":186.58},{"_xmin":82.85,"_ymin":38.18,"_xmax":141.81,"_ymax":186.78},{"_xmin":83.48,"_ymin":38.47,"_xmax":142.17,"_ymax":187.02},{"_xmin":83.66,"_ymin":38.44,"_xmax":142.21,"_ymax":187.15},{"_xmin":83.79,"_ymin":38.65,"_xmax":142.21,"_ymax":187.35},{"_xmin":83.15,"_ymin":38.42,"_xmax":141.47,"_ymax":186.91},{"_xmin":83.41,"_ymin":37.96,"_xmax":141.8,"_ymax":186.38},{"_xmin":83.15,"_ymin":37.93,"_xmax":141.79,"_ymax":186.44},{"_xmin":82.8,"_ymin":38.64,"_xmax":141.72,"_ymax":187.2},{"_xmin":82.82,"_ymin":38.18,"_xmax":141.9,"_ymax":186.59},{"_xmin":82.78,"_ymin":38.57,"_xmax":142.04,"_ymax":187.19},{"_xmin":82.86,"_ymin":38.58,"_xmax":142.02,"_ymax":187.13},{"_xmin":82.93,"_ymin":38.29,"_xmax":141.97,"_ymax":186.94},{"_xmin":83.0,"_ymin":38.25,"_xmax":141.91,"_ymax":186.79},{"_xmin":83.38,"_ymin":38.13,"_xmax":142.27,"_ymax":186.71},{"_xmin":83.09,"_ymin":38.57,"_xmax":141.92,"_ymax":186.99},{"_xmin":83.14,"_ymin":38.03,"_xmax":142.01,"_ymax":186.59},{"_xmin":83.28,"_ymin":38.14,"_xmax":142.13,"_ymax":186.86},{"_xmin":83.22,"_ymin":38.35,"_xmax":141.84,"_ymax":187.05},{"_xmin":83.7,"_ymin":38.43,"_xmax":141.88,"_ymax":186.98},{"_xmin":83.6,"_ymin":38.05,"_xmax":141.25,"_ymax":186.64},{"_xmin":84.35,"_ymin":38.24,"_xmax":141.38,"_ymax":186.93},{"_xmin":84.26,"_ymin":38.39,"_xmax":140.44,"_ymax":186.8},{"_xmin":84.49,"_ymin":38.2,"_xmax":139.94,"_ymax":186.64},{"_xmin":85.35,"_ymin":38.2,"_xmax":140.12,"_ymax":186.65},{"_xmin":85.31,"_ymin":38.37,"_xmax":139.52,"_ymax":186.84},{"_xmin":85.7,"_ymin":38.04,"_xmax":139.51,"_ymax":186.53},{"_xmin":85.9,"_ymin":38.58,"_xmax":139.35,"_ymax":187.12},{"_xmin":86.01,"_ymin":38.6,"_xmax":138.99,"_ymax":187.24},{"_xmin":86.56,"_ymin":37.94,"_xmax":138.67,"_ymax":186.37},{"_xmin":86.84,"_ymin":38.28,"_xmax":137.83,"_ymax":186.94},{"_xmin":87.73,"_ymin":38.41,"_xmax":137.26,"_ymax":187.04},{"_xmin":88.86,"_ymin":38.28,"_xmax":136.91,"_ymax":187.0},{"_xmin":89.33,"_ymin":38.17,"_xmax":135.96,"_ymax":186.81},{"_xmin":90.12,"_ymin":38.15,"_xmax":135.64,"_ymax":186.85},{"_xmin":90.15,"_ymin":38.29,"_xmax":134.72,"_ymax":186.72},{"_xmin":90.78,"_ymin":38.09,"_xmax":134.77,"_ymax":186.74},{"_xmin":90.65,"_ymin":38.16,"_xmax":134.13,"_ymax":186.72},{"_xmin":90.68,"_ymin":38.01,"_xmax":133.8,"_ymax":186.53},{"_xmin":90.88,"_ymin":38.25,"_xmax":133.78,"_ymax":186.84},{"_xmin":90.97,"_ymin":37.99,"_xmax":133.7,"_ymax":186.5},{"_xmin":91.23,"_ymin":38.68,"_xmax":133.93,"_ymax":187.34},{"_xmin":91.1,"_ymin":37.93,"_xmax":133.73,"_ymax":186.4},{"_xmin":91.15,"_ymin":38.01,"_xmax":133.83,"_ymax":186.65},{"_xmin":91.47,"_ymin":38.54,"_xmax":134.17,"_ymax":187.13},{"_xmin":91.02,"_ymin":38.33,"_xmax":133.78,"_ymax":187.04},{"_xmin":91.53,"_ymin":38.65,"_xmax":134.29,"_ymax":187.36},{"_xmin":91.04,"_ymin":38.29,"_xmax":133.72,"_ymax":186.94},{"_xmin":91.56,"_ymin":38.06,"_xmax":134.05,"_ymax":186.63},{"_xmin":91.12,"_ymin":38.05,"_xmax":133.34,"_ymax":186.51},{"_xmin":91.35,"_ymin":38.29,"_xmax":133.26,"_ymax":186.84},{"_xmin":91.6,"_ymin":38.62,"_xmax":133.18,"_ymax":187.29},{"_xmin":91.91,"_ymin":38.29,"_xmax":133.09,"_ymax":186.94},{"_xmin":92.25,"_ymin":38.54,"_xmax":133.01,"_ymax":187.01},{"_xmin":92.55,"_ymin":38.42,"_xmax":133.04,"_ymax":187.11},{"_xmin":92.89,"_ymin":38.24,"_xmax":133.09,"_ymax":186.89},{"_xmin":92.91,"_ymin":38.16,"_xmax":132.89,"_ymax":186.8},{"_xmin":92.6,"_ymin":38.18,"_xmax":132.47,"_ymax":186.84},{"_xmin":92.31,"_ymin":38.14,"_xmax":132.09,"_ymax":186.56},{"_xmin":92.59,"_ymin":38.36,"_xmax":132.47,"_ymax":186.89},{"_xmin":92.25,"_ymin":38.52,"_xmax":132.29,"_ymax":186.95},{"_xmin":92.21,"_ymin":38.15,"_xmax":132.52,"_ymax":186.61},{"_xmin":92.46,"_ymin":38.51,"_xmax":133.2,"_ymax":187.19},{"_xmin":91.82,"_ymin":38.01,"_xmax":132.98,"_ymax":186.54},{"_xmin":92.14,"_ymin":38.25,"_xmax":133.85,"_ymax":186.94},{"_xmin":91.58,"_ymin":38.26,"_xmax":133.82,"_ymax":186.86},{"_xmin":91.05,"_ymin":38.21,"_xmax":133.9,"_ymax":186.83},{"_xmin":90.95,"_ymin":38.59,"_xmax":134.38,"_ymax":187.07},{"_xmin":90.87,"_ymin":38.22,"_xmax":134.96,"_ymax":186.8},{"_xmin":90.36,"_ymin":38.17,"_xmax":135.08,"_ymax":186.71},{"_xmin":90.15,"_ymin":38.48,"_xmax":135.57,"_ymax":187.13},{"_xmin":89.77,"_ymin":38.2,"_xmax":135.84,"_ymax":186.73},{"_xmin":89.37,"_ymin":38.12,"_xmax":136.14,"_ymax":186.58},{"_xmin":88.48,"_ymin":38.41,"_xmax":136.0,"_ymax":186.82},{"_xmin":88.32,"_ymin":38.17,"_xmax":136.72,"_ymax":186.81},{"_xmin":87.94,"_ymin":37.96,"_xmax":137.14,"_ymax":186.43},{"_xmin":87.45,"_ymin":38.31,"_xmax":137.58,"_ymax":186.83},{"_xmin":87.25,"_ymin":38.38,"_xmax":138.33,"_ymax":186.85},{"_xmin":86.9,"_ymin":38.26,"_xmax":139.04,"_ymax":186.97},{"_xmin":85.8,"_ymin":38.23,"_xmax":138.88,"_ymax":186.81},{"_xmin":85.52,"_ymin":38.38,"_xmax":139.52,"_ymax":186.84},{"_xmin":84.95,"_ymin":38.14,"_xmax":139.96,"_ymax":186.85},{"_xmin":84.84,"_ymin":38.45,"_xmax":140.7,"_ymax":187.15},{"_xmin":84.48,"_ymin":37.9,"_xmax":141.01,"_ymax":186.35},{"_xmin":83.73,"_ymin":38.19,"_xmax":141.0,"_ymax":186.81},{"_xmin":83.4,"_ymin":38.28,"_xmax":141.23,"_ymax":186.83},{"_xmin":83.08,"_ymin":38.36,"_xmax":141.43,"_ymax":186.91},{"_xmin":83.46,"_ymin":38.42,"_xmax":142.27,"_ymax":187.04},{"_xmin":82.9,"_ymin":38.3,"_xmax":142.0,"_ymax":186.7},{"_xmin":82.63,"_ymin":38.33,"_xmax":142.1,"_ymax":186.91},{"_xmin":82.79,"_ymin":38.18,"_xmax":142.44,"_ymax":186.63},{"_xmin":82.55,"_ymin":38.19,"_xmax":142.49,"_ymax":186.88},{"_xmin":82.74,"_ymin":38.02,"_xmax":142.79,"_ymax":186.63},{"_xmin":82.27,"_ymin":38.51,"_xmax":142.38,"_ymax":187.05},{"_xmin":82.49,"_ymin":38.32,"_xmax":142.63,"_ymax":186.8},{"_xmin":82.06,"_ymin":38.13,"_xmax":142.2,"_ymax":186.53},{"_xmin":82.59,"_ymin":38.1,"_xmax":142.81,"_ymax":186.74},{"_xmin":82.19,"_ymin":38.56,"_xmax":142.35,"_ymax":187.09},{"_xmin":82.75,"_ymin":38.52,"_xmax":142.94,"_ymax":187.24},{"_xmin":82.39,"_ymin":38.31,"_xmax":142.45,"_ymax":186.84},{"_xmin":82.31,"_ymin":38.27,"_xmax":142.35,"_ymax":186.92},{"_xmin":82.66,"_ymin":38.06,"_xmax":142.51,"_ymax":186.46},{"_xmin":82.63,"_ymin":38.02,"_xmax":142.4,"_ymax":186.48},{"_xmin":82.61,"_ymin":37.98,"_xmax":142.29,"_ymax":186.47},{"_xmin":82.61,"_ymin":37.95,"_xmax":142.17,"_ymax":186.44},{"_xmin":82.64,"_ymin":38.59,"_xmax":142.06,"_ymax":187.06},{"_xmin":82.69,"_ymin":38.56,"_xmax":141.94,"_ymax":187.0},{"_xmin":83.02,"_ymin":38.05,"_xmax":142.23,"_ymax":186.77},{"_xmin":83.11,"_ymin":38.04,"_xmax":142.13,"_ymax":186.69},{"_xmin":83.22,"_ymin":38.04,"_xmax":142.04,"_ymax":186.6},{"_xmin":83.36,"_ymin":38.05,"_xmax":141.95,"_ymax":186.51},{"_xmin":83.78,"_ymin":38.25,"_xmax":142.27,"_ymax":186.92},{"_xmin":83.3,"_ymin":38.28,"_xmax":141.54,"_ymax":186.84},{"_xmin":83.5,"_ymin":38.34,"_xmax":141.49,"_ymax":186.76},{"_xmin":83.99,"_ymin":38.57,"_xmax":141.86,"_ymax":187.19},{"_xmin":83.56,"_ymin":37.99,"_xmax":141.17,"_ymax":186.44},{"_xmin":84.08,"_ymin":38.25,"_xmax":141.56,"_ymax":186.88},{"_xmin":83.66,"_ymin":38.37,"_xmax":140.89,"_ymax":186.8},{"_xmin":84.18,"_ymin":38.01,"_xmax":141.3,"_ymax":186.56},{"_xmin":84.03,"_ymin":38.33,"_xmax":141.05,"_ymax":186.98},{"_xmin":84.54,"_ymin":38.68,"_xmax":141.48,"_ymax":187.4},{"_xmin":84.09,"_ymin":38.2,"_xmax":140.84,"_ymax":186.63},{"_xmin":83.89,"_ymin":38.58,"_xmax":140.61,"_ymax":187.03},{"_xmin":84.34,"_ymin":38.3,"_xmax":141.06,"_ymax":186.74},{"_xmin":84.09,"_ymin":38.01,"_xmax":140.84,"_ymax":186.43},{"_xmin":84.07,"_ymin":38.57,"_xmax":141.02,"_ymax":187.3},{"_xmin":84.42,"_ymin":38.27,"_xmax":141.49,"_ymax":186.95},{"_xmin":84.04,"_ymin":38.64,"_xmax":141.27,"_ymax":187.28},{"_xmin":83.61,"_ymin":38.3,"_xmax":141.05,"_ymax":186.89},{"_xmin":83.82,"_ymin":37.95,"_xmax":141.51,"_ymax":186.5},{"_xmin":83.28,"_ymin":38.4,"_xmax":141.29,"_ymax":186.95},{"_xmin":83.4,"_ymin":38.38,"_xmax":141.76,"_ymax":187.04},{"_xmin":83.2,"_ymin":38.57,"_xmax":141.8,"_ymax":187.12},{"_xmin":82.98,"_ymin":38.49,"_xmax":141.85,"_ymax":187.09},{"_xmin":83.43,"_ymin":38.68,"_xmax":142.58,"_ymax":187.36},{"_xmin":83.17,"_ymin":38.18,"_xmax":142.61,"_ymax":186.88},{"_xmin":82.91,"_ymin":38.14,"_xmax":142.63,"_ymax":186.83},{"_xmin":82.65,"_ymin":38.37,"_xmax":142.64,"_ymax":187.04},{"_xmin":82.4,"_ymin":38.2,"_xmax":142.64,"_ymax":186.86},{"_xmin":82.17,"_ymin":38.4,"_xmax":142.63,"_ymax":187.06},{"_xmin":82.65,"_ymin":38.36,"_xmax":143.3,"_ymax":187.03},{"_xmin":82.47,"_ymin":38.16,"_xmax":143.26,"_ymax":186.84},{"_xmin":82.33,"_ymin":38.57,"_xmax":143.21,"_ymax":187.28},{"_xmin":81.96,"_ymin":38.1,"_xmax":142.73,"_ymax":186.51},{"_xmin":81.91,"_ymin":38.56,"_xmax":142.65,"_ymax":187.01},{"_xmin":81.91,"_ymin":38.39,"_xmax":142.55,"_ymax":186.9},{"_xmin":82.65,"_ymin":38.28,"_xmax":143.14,"_ymax":186.86},{"_xmin":82.74,"_ymin":38.23,"_xmax":143.04,"_ymax":186.87},{"_xmin":82.86,"_ymin":38.22,"_xmax":142.93,"_ymax":186.92},{"_xmin":82.74,"_ymin":38.08,"_xmax":142.42,"_ymax":186.49},{"_xmin":82.92,"_ymin":38.15,"_xmax":142.33,"_ymax":186.6},{"_xmin":83.13,"_ymin":38.25,"_xmax":142.25,"_ymax":186.73},{"_xmin":83.36,"_ymin":38.37,"_xmax":142.19,"_ymax":186.86},{"_xmin":82.92,"_ymin":38.51,"_xmax":141.46,"_ymax":186.98},{"_xmin":83.17,"_ymin":37.96,"_xmax":141.45,"_ymax":186.39},{"_xmin":83.72,"_ymin":38.29,"_xmax":141.89,"_ymax":186.99},{"_xmin":83.3,"_ymin":38.44,"_xmax":141.25,"_ymax":187.03},{"_xmin":83.58,"_ymin":38.58,"_xmax":141.34,"_ymax":187.02},{"_xmin":83.44,"_ymin":38.18,"_xmax":141.2,"_ymax":186.78},{"_xmin":83.71,"_ymin":38.22,"_xmax":141.39,"_ymax":186.63},{"_xmin":83.36,"_ymin":38.56,"_xmax":140.93,"_ymax":186.99},{"_xmin":84.25,"_ymin":37.98,"_xmax":141.52,"_ymax":186.53},{"_xmin":84.09,"_ymin":38.12,"_xmax":140.65,"_ymax":186.69},{"_xmin":84.96,"_ymin":38.21,"_xmax":140.42,"_ymax":186.78},{"_xmin":85.3,"_ymin":38.1,"_xmax":139.47,"_ymax":186.49},{"_xmin":86.13,"_ymin":38.12,"_xmax":139.3,"_ymax":186.68},{"_xmin":86.29,"_ymin":38.12,"_xmax":138.83,"_ymax":186.64},{"_xmin":86.55,"_ymin":38.09,"_xmax":139.09,"_ymax":186.57}]}},"_activities":{"c30054e8eb9011ea9217ac1f6b2c363c":{"_id":"c30054e8eb9011ea9217ac1f6b2c363c","_startframe":0,"_endframe":285,"_framerate":30.0,"_label":"person_carries_heavy_object","_shortlabel":"carrying","_trackid":["c3005754eb9011ea9217ac1f6b2c363c"],"_actorid":null,"attributes":{"blurred_faces":0,"collected_date":"2020-05-06 15:27:25","collection_id":"P004C006","collector_id":"XXXX@gmail.com","device_identifier":"android","device_type":"CPH1969","duration":16,"frame_height":1920,"frame_rate":30.0,"frame_width":1080,"orientation":"portrait","os_version":"28","project_id":"P004","subject_ids":["20200506_1527244575402164829910826"],"video_id":"20200506_1527244575402164829910826","rotate":null}}}}'
We recommend that in addition to your planned training data augmentation, you introduce scale jittering. The MEVA dataset includes many ultra-tiny people walking far from the camera. The PIP dataset does not include these tiny people, but the scale variation can be introduced synthetically by downsampling the crops to an appropriate resolution to best match the target domain shift prior to training.
v.thumbnail(frame=2).show().mindim(32).mindim(256).show()
<vipy.image.scene: height=256, width=256, color=rgb, category="person_carries_heavy_object", objects=1>